Overview

Dataset statistics

Number of variables15
Number of observations6362620
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory728.1 MiB
Average record size in memory120.0 B

Variable types

Numeric9
Categorical6

Warnings

name_orig has a high cardinality: 6353307 distinct values High cardinality
name_dest has a high cardinality: 2722362 distinct values High cardinality
step is highly correlated with daysHigh correlation
amount is highly correlated with error_origHigh correlation
oldbalance_orig is highly correlated with newbalance_origHigh correlation
newbalance_orig is highly correlated with oldbalance_origHigh correlation
oldbalance_dest is highly correlated with newbalance_destHigh correlation
newbalance_dest is highly correlated with oldbalance_destHigh correlation
error_orig is highly correlated with amountHigh correlation
days is highly correlated with stepHigh correlation
dest_type is highly correlated with typeHigh correlation
type is highly correlated with dest_typeHigh correlation
amount is highly skewed (γ1 = 30.99394956) Skewed
error_orig is highly skewed (γ1 = -30.07474652) Skewed
error_dest is highly skewed (γ1 = -49.20227538) Skewed
name_orig is uniformly distributed Uniform
oldbalance_orig has 2102449 (33.0%) zeros Zeros
newbalance_orig has 3609566 (56.7%) zeros Zeros
oldbalance_dest has 2704388 (42.5%) zeros Zeros
newbalance_dest has 2439433 (38.3%) zeros Zeros
error_orig has 414047 (6.5%) zeros Zeros
error_dest has 719354 (11.3%) zeros Zeros

Reproduction

Analysis started2021-01-22 02:37:16.825966
Analysis finished2021-01-22 03:27:44.182787
Duration50 minutes and 27.36 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

step
Real number (ℝ≥0)

HIGH CORRELATION

Distinct743
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean243.3972456
Minimum1
Maximum743
Zeros0
Zeros (%)0.0%
Memory size48.5 MiB
2021-01-22T00:27:44.736587image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile16
Q1156
median239
Q3335
95-th percentile490
Maximum743
Range742
Interquartile range (IQR)179

Descriptive statistics

Standard deviation142.331971
Coefficient of variation (CV)0.5847723161
Kurtosis0.329070555
Mean243.3972456
Median Absolute Deviation (MAD)92
Skewness0.3751768885
Sum1548644183
Variance20258.38998
MonotocityIncreasing
2021-01-22T00:27:44.927583image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1951352
 
0.8%
1849579
 
0.8%
18749083
 
0.8%
23547491
 
0.7%
30746968
 
0.7%
16346352
 
0.7%
13946054
 
0.7%
40345155
 
0.7%
4345060
 
0.7%
35544787
 
0.7%
Other values (733)5890739
92.6%
ValueCountFrequency (%)
12708
< 0.1%
21014
 
< 0.1%
3552
 
< 0.1%
4565
 
< 0.1%
5665
 
< 0.1%
ValueCountFrequency (%)
7438
 
< 0.1%
74214
< 0.1%
74122
< 0.1%
7406
 
< 0.1%
73910
< 0.1%

type
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
CASH_OUT
2237500 
PAYMENT
2151495 
CASH_IN
1399284 
TRANSFER
532909 
DEBIT
 
41432

Length

Max length8
Median length7
Mean length7.422395963
Min length5

Characters and Unicode

Total characters47225885
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPAYMENT
2nd rowPAYMENT
3rd rowTRANSFER
4th rowCASH_OUT
5th rowPAYMENT
ValueCountFrequency (%)
CASH_OUT2237500
35.2%
PAYMENT2151495
33.8%
CASH_IN1399284
22.0%
TRANSFER532909
 
8.4%
DEBIT41432
 
0.7%
2021-01-22T00:27:45.207232image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-01-22T00:27:45.374170image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
cash_out2237500
35.2%
payment2151495
33.8%
cash_in1399284
22.0%
transfer532909
 
8.4%
debit41432
 
0.7%

Most occurring characters

ValueCountFrequency (%)
A6321188
13.4%
T4963336
10.5%
S4169693
8.8%
N4083688
8.6%
C3636784
 
7.7%
H3636784
 
7.7%
_3636784
 
7.7%
E2725836
 
5.8%
O2237500
 
4.7%
U2237500
 
4.7%
Other values (8)9576792
20.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter43589101
92.3%
Connector Punctuation3636784
 
7.7%

Most frequent character per category

ValueCountFrequency (%)
A6321188
14.5%
T4963336
11.4%
S4169693
9.6%
N4083688
9.4%
C3636784
8.3%
H3636784
8.3%
E2725836
 
6.3%
O2237500
 
5.1%
U2237500
 
5.1%
P2151495
 
4.9%
Other values (7)7425297
17.0%
ValueCountFrequency (%)
_3636784
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin43589101
92.3%
Common3636784
 
7.7%

Most frequent character per script

ValueCountFrequency (%)
A6321188
14.5%
T4963336
11.4%
S4169693
9.6%
N4083688
9.4%
C3636784
8.3%
H3636784
8.3%
E2725836
 
6.3%
O2237500
 
5.1%
U2237500
 
5.1%
P2151495
 
4.9%
Other values (7)7425297
17.0%
ValueCountFrequency (%)
_3636784
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII47225885
100.0%

Most frequent character per block

ValueCountFrequency (%)
A6321188
13.4%
T4963336
10.5%
S4169693
8.8%
N4083688
8.6%
C3636784
 
7.7%
H3636784
 
7.7%
_3636784
 
7.7%
E2725836
 
5.8%
O2237500
 
4.7%
U2237500
 
4.7%
Other values (8)9576792
20.3%

amount
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct5236933
Distinct (%)82.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean179861.9036
Minimum0
Maximum92445520
Zeros16
Zeros (%)< 0.1%
Memory size48.5 MiB
2021-01-22T00:27:48.756581image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2224.0995
Q113389.57
median74871.94
Q3208721.4775
95-th percentile518634.205
Maximum92445520
Range92445520
Interquartile range (IQR)195331.9075

Descriptive statistics

Standard deviation603858.2319
Coefficient of variation (CV)3.357343717
Kurtosis1797.956717
Mean179861.9036
Median Absolute Deviation (MAD)68393.655
Skewness30.99394956
Sum1.144392945 × 1012
Variance3.646447642 × 1011
MonotocityNot monotonic
2021-01-22T00:27:48.903792image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100000003207
 
0.1%
1000088
 
< 0.1%
500079
 
< 0.1%
1500068
 
< 0.1%
50065
 
< 0.1%
10000042
 
< 0.1%
2150037
 
< 0.1%
12000029
 
< 0.1%
13500020
 
< 0.1%
016
 
< 0.1%
Other values (5236923)6358969
99.9%
ValueCountFrequency (%)
016
< 0.1%
0.011
 
< 0.1%
0.023
 
< 0.1%
0.032
 
< 0.1%
0.041
 
< 0.1%
ValueCountFrequency (%)
924455201
< 0.1%
738234901
< 0.1%
711724801
< 0.1%
698867301
< 0.1%
693373201
< 0.1%

name_orig
Categorical

HIGH CARDINALITY
UNIFORM

Distinct6353307
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
C400299098
 
3
C1999539787
 
3
C1784010646
 
3
C1065307291
 
3
C1462946854
 
3
Other values (6353302)
6362605 

Length

Max length11
Median length11
Mean length10.48232332
Min length5

Characters and Unicode

Total characters66695040
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6344009 ?
Unique (%)99.7%

Sample

1st rowC1231006815
2nd rowC1666544295
3rd rowC1305486145
4th rowC840083671
5th rowC2048537720
ValueCountFrequency (%)
C4002990983
 
< 0.1%
C19995397873
 
< 0.1%
C17840106463
 
< 0.1%
C10653072913
 
< 0.1%
C14629468543
 
< 0.1%
C3637366743
 
< 0.1%
C7244528793
 
< 0.1%
C16777950713
 
< 0.1%
C15305449953
 
< 0.1%
C20985253063
 
< 0.1%
Other values (6353297)6362590
> 99.9%
2021-01-22T00:28:30.239046image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c7244528793
 
< 0.1%
c15305449953
 
< 0.1%
c19762081143
 
< 0.1%
c14629468543
 
< 0.1%
c10653072913
 
< 0.1%
c19995397873
 
< 0.1%
c17840106463
 
< 0.1%
c3637366743
 
< 0.1%
c18325480283
 
< 0.1%
c16777950713
 
< 0.1%
Other values (6353297)6362590
> 99.9%

Most occurring characters

ValueCountFrequency (%)
18803448
13.2%
C6362620
9.5%
26136135
9.2%
35699596
8.5%
45693146
8.5%
75669437
8.5%
55668010
8.5%
65667725
8.5%
05667074
8.5%
95665212
8.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number60332420
90.5%
Uppercase Letter6362620
 
9.5%

Most frequent character per category

ValueCountFrequency (%)
18803448
14.6%
26136135
10.2%
35699596
9.4%
45693146
9.4%
75669437
9.4%
55668010
9.4%
65667725
9.4%
05667074
9.4%
95665212
9.4%
85662637
9.4%
ValueCountFrequency (%)
C6362620
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common60332420
90.5%
Latin6362620
 
9.5%

Most frequent character per script

ValueCountFrequency (%)
18803448
14.6%
26136135
10.2%
35699596
9.4%
45693146
9.4%
75669437
9.4%
55668010
9.4%
65667725
9.4%
05667074
9.4%
95665212
9.4%
85662637
9.4%
ValueCountFrequency (%)
C6362620
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII66695040
100.0%

Most frequent character per block

ValueCountFrequency (%)
18803448
13.2%
C6362620
9.5%
26136135
9.2%
35699596
8.5%
45693146
8.5%
75669437
8.5%
55668010
8.5%
65667725
8.5%
05667074
8.5%
95665212
8.5%

oldbalance_orig
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct1834373
Distinct (%)28.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean833883.104
Minimum0
Maximum59585040
Zeros2102449
Zeros (%)33.0%
Memory size48.5 MiB
2021-01-22T00:28:32.413018image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median14208
Q3107315.175
95-th percentile5823702.1
Maximum59585040
Range59585040
Interquartile range (IQR)107315.175

Descriptive statistics

Standard deviation2888242.673
Coefficient of variation (CV)3.46360618
Kurtosis32.96487855
Mean833883.104
Median Absolute Deviation (MAD)14208
Skewness5.249136421
Sum5.305681315 × 1012
Variance8.341945738 × 1012
MonotocityNot monotonic
2021-01-22T00:28:32.560389image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02102449
33.0%
184918
 
< 0.1%
133914
 
< 0.1%
195912
 
< 0.1%
164909
 
< 0.1%
109908
 
< 0.1%
181908
 
< 0.1%
157902
 
< 0.1%
146899
 
< 0.1%
128898
 
< 0.1%
Other values (1834363)4252003
66.8%
ValueCountFrequency (%)
02102449
33.0%
0.051
 
< 0.1%
0.181
 
< 0.1%
0.211
 
< 0.1%
0.441
 
< 0.1%
ValueCountFrequency (%)
595850401
< 0.1%
573162561
< 0.1%
503990441
< 0.1%
495850401
< 0.1%
473162561
< 0.1%

newbalance_orig
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct2663280
Distinct (%)41.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean855113.6686
Minimum0
Maximum49585040
Zeros3609566
Zeros (%)56.7%
Memory size48.5 MiB
2021-01-22T00:28:34.159014image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3144258.4
95-th percentile5980262.375
Maximum49585040
Range49585040
Interquartile range (IQR)144258.4

Descriptive statistics

Standard deviation2924048.503
Coefficient of variation (CV)3.419485164
Kurtosis32.06698456
Mean855113.6686
Median Absolute Deviation (MAD)0
Skewness5.176884001
Sum5.44076333 × 1012
Variance8.550059647 × 1012
MonotocityNot monotonic
2021-01-22T00:28:34.311366image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03609566
56.7%
944.554
 
< 0.1%
9226.554
 
< 0.1%
2987.954
 
< 0.1%
5317.94
 
< 0.1%
10093.854
 
< 0.1%
19220.184
 
< 0.1%
31285.614
 
< 0.1%
2043.144
 
< 0.1%
23623.994
 
< 0.1%
Other values (2663270)2753018
43.3%
ValueCountFrequency (%)
03609566
56.7%
0.011
 
< 0.1%
0.031
 
< 0.1%
0.051
 
< 0.1%
0.121
 
< 0.1%
ValueCountFrequency (%)
495850401
< 0.1%
473162561
< 0.1%
436866161
< 0.1%
436738041
< 0.1%
416908441
< 0.1%

name_dest
Categorical

HIGH CARDINALITY

Distinct2722362
Distinct (%)42.8%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
C1286084959
 
113
C985934102
 
109
C665576141
 
105
C2083562754
 
102
C248609774
 
101
Other values (2722357)
6362090 

Length

Max length11
Median length11
Mean length10.48175201
Min length2

Characters and Unicode

Total characters66691405
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2262704 ?
Unique (%)35.6%

Sample

1st rowM1979787155
2nd rowM2044282225
3rd rowC553264065
4th rowC38997010
5th rowM1230701703
ValueCountFrequency (%)
C1286084959113
 
< 0.1%
C985934102109
 
< 0.1%
C665576141105
 
< 0.1%
C2083562754102
 
< 0.1%
C248609774101
 
< 0.1%
C1590550415101
 
< 0.1%
C178955025699
 
< 0.1%
C45111135199
 
< 0.1%
C136076758998
 
< 0.1%
C102371406597
 
< 0.1%
Other values (2722352)6361596
> 99.9%
2021-01-22T00:28:50.570920image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c1286084959113
 
< 0.1%
c985934102109
 
< 0.1%
c665576141105
 
< 0.1%
c2083562754102
 
< 0.1%
c248609774101
 
< 0.1%
c1590550415101
 
< 0.1%
c178955025699
 
< 0.1%
c45111135199
 
< 0.1%
c136076758998
 
< 0.1%
c102371406597
 
< 0.1%
Other values (2722352)6361596
> 99.9%

Most occurring characters

ValueCountFrequency (%)
18799996
13.2%
26133780
9.2%
35704404
8.6%
45691070
8.5%
85675627
8.5%
95668861
8.5%
75665128
8.5%
05664751
8.5%
65662897
8.5%
55662271
8.5%
Other values (2)6362620
9.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number60328785
90.5%
Uppercase Letter6362620
 
9.5%

Most frequent character per category

ValueCountFrequency (%)
18799996
14.6%
26133780
10.2%
35704404
9.5%
45691070
9.4%
85675627
9.4%
95668861
9.4%
75665128
9.4%
05664751
9.4%
65662897
9.4%
55662271
9.4%
ValueCountFrequency (%)
C4211125
66.2%
M2151495
33.8%

Most occurring scripts

ValueCountFrequency (%)
Common60328785
90.5%
Latin6362620
 
9.5%

Most frequent character per script

ValueCountFrequency (%)
18799996
14.6%
26133780
10.2%
35704404
9.5%
45691070
9.4%
85675627
9.4%
95668861
9.4%
75665128
9.4%
05664751
9.4%
65662897
9.4%
55662271
9.4%
ValueCountFrequency (%)
C4211125
66.2%
M2151495
33.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII66691405
100.0%

Most frequent character per block

ValueCountFrequency (%)
18799996
13.2%
26133780
9.2%
35704404
8.6%
45691070
8.5%
85675627
8.5%
95668861
8.5%
75665128
8.5%
05664751
8.5%
65662897
8.5%
55662271
8.5%
Other values (2)6362620
9.5%

oldbalance_dest
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct3532215
Distinct (%)55.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1100701.666
Minimum0
Maximum356015900
Zeros2704388
Zeros (%)42.5%
Memory size48.5 MiB
2021-01-22T00:28:53.046167image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median132705.665
Q3943036.6875
95-th percentile5147229.7
Maximum356015900
Range356015900
Interquartile range (IQR)943036.6875

Descriptive statistics

Standard deviation3399180.111
Coefficient of variation (CV)3.088193844
Kurtosis948.6741239
Mean1100701.666
Median Absolute Deviation (MAD)132705.665
Skewness19.92175787
Sum7.003346437 × 1012
Variance1.155442542 × 1013
MonotocityNot monotonic
2021-01-22T00:28:53.198863image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02704388
42.5%
10000000615
 
< 0.1%
20000000219
 
< 0.1%
3000000086
 
< 0.1%
4000000031
 
< 0.1%
10221
 
< 0.1%
19819
 
< 0.1%
16018
 
< 0.1%
12518
 
< 0.1%
13218
 
< 0.1%
Other values (3532205)3657187
57.5%
ValueCountFrequency (%)
02704388
42.5%
0.011
 
< 0.1%
0.031
 
< 0.1%
0.131
 
< 0.1%
0.331
 
< 0.1%
ValueCountFrequency (%)
3560159001
< 0.1%
3555534001
< 0.1%
3553814401
< 0.1%
3553804801
< 0.1%
3551855401
< 0.1%

newbalance_dest
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct3474507
Distinct (%)54.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1224996.398
Minimum0
Maximum356179260
Zeros2439433
Zeros (%)38.3%
Memory size48.5 MiB
2021-01-22T00:28:55.333917image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median214661.445
Q31111909.2
95-th percentile5515715.975
Maximum356179260
Range356179260
Interquartile range (IQR)1111909.2

Descriptive statistics

Standard deviation3674128.939
Coefficient of variation (CV)2.999297748
Kurtosis862.1564998
Mean1224996.398
Median Absolute Deviation (MAD)214661.445
Skewness19.35230197
Sum7.794186583 × 1012
Variance1.349922346 × 1013
MonotocityNot monotonic
2021-01-22T00:28:55.481730image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02439433
38.3%
1000000053
 
< 0.1%
971418.9432
 
< 0.1%
1916920429
 
< 0.1%
1254956.125
 
< 0.1%
1653203225
 
< 0.1%
1412484.122
 
< 0.1%
1178808.121
 
< 0.1%
4743010.521
 
< 0.1%
736472521
 
< 0.1%
Other values (3474497)3922938
61.7%
ValueCountFrequency (%)
02439433
38.3%
0.011
 
< 0.1%
0.331
 
< 0.1%
1.391
 
< 0.1%
1.641
 
< 0.1%
ValueCountFrequency (%)
3561792601
< 0.1%
3560159001
< 0.1%
3555534002
< 0.1%
3553814401
< 0.1%
3553804801
< 0.1%

is_fraud
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
0
6354407 
1
 
8213

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6362620
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row1
5th row0
ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%
2021-01-22T00:28:55.743270image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-01-22T00:28:55.827214image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

Most occurring characters

ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6362620
100.0%

Most frequent character per category

ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common6362620
100.0%

Most frequent character per script

ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII6362620
100.0%

Most frequent character per block

ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

is_flagged_fraud
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
0
6362604 
1
 
16

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6362620
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%
2021-01-22T00:28:56.039020image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-01-22T00:28:56.108005image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6362620
100.0%

Most frequent character per category

ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common6362620
100.0%

Most frequent character per script

ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII6362620
100.0%

Most frequent character per block

ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

error_orig
Real number (ℝ)

HIGH CORRELATION
SKEWED
ZEROS

Distinct4664191
Distinct (%)73.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-201092.4681
Minimum-92445520
Maximum4.24
Zeros414047
Zeros (%)6.5%
Memory size48.5 MiB
2021-01-22T00:28:59.227866image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-92445520
5-th percentile-700716.6235
Q1-249641.0825
median-68677.255
Q3-2954.1975
95-th percentile2.728484105 × 1012
Maximum4.24
Range92445524.24
Interquartile range (IQR)246686.885

Descriptive statistics

Standard deviation606650.4605
Coefficient of variation (CV)-3.016773658
Kurtosis1753.268489
Mean-201092.4681
Median Absolute Deviation (MAD)68677.255
Skewness-30.07474652
Sum-1.279474959 × 1012
Variance3.680247813 × 1011
MonotocityNot monotonic
2021-01-22T00:28:59.365623image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0414047
 
6.5%
1.818989404 × 101249996
 
0.8%
-1.818989404 × 101249909
 
0.8%
3.637978807 × 101242006
 
0.7%
-3.637978807 × 101241919
 
0.7%
-9.094947018 × 101335430
 
0.6%
9.094947018 × 101335357
 
0.6%
-7.275957614 × 101219682
 
0.3%
7.275957614 × 101219631
 
0.3%
-4.547473509 × 101316651
 
0.3%
Other values (4664181)5637992
88.6%
ValueCountFrequency (%)
-924455201
< 0.1%
-738234901
< 0.1%
-711724801
< 0.1%
-698867301
< 0.1%
-693373201
< 0.1%
ValueCountFrequency (%)
4.241
< 0.1%
2.831
< 0.1%
2.531
< 0.1%
21
< 0.1%
1.981
< 0.1%

error_dest
Real number (ℝ)

SKEWED
ZEROS

Distinct3171375
Distinct (%)49.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean55567.17192
Minimum-75885718
Maximum13191233.77
Zeros719354
Zeros (%)11.3%
Memory size48.5 MiB
2021-01-22T00:29:01.522608image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-75885718
5-th percentile-0.1500000001
Q10
median3500.49
Q329353.0175
95-th percentile456741
Maximum13191233.77
Range89076951.77
Interquartile range (IQR)29353.0175

Descriptive statistics

Standard deviation441528.7676
Coefficient of variation (CV)7.945856381
Kurtosis4190.553496
Mean55567.17192
Median Absolute Deviation (MAD)3500.59
Skewness-49.20227538
Sum3.535527994 × 1011
Variance1.949476526 × 1011
MonotocityNot monotonic
2021-01-22T00:29:01.670910image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0719354
 
11.3%
-0.01000000001128484
 
2.0%
0.01000000001128402
 
2.0%
-0.0200000000298015
 
1.5%
0.0200000000297753
 
1.5%
0.0300000000349378
 
0.8%
-0.0300000000349230
 
0.8%
-0.0400000000443796
 
0.7%
0.0400000000443743
 
0.7%
-0.0600000000628766
 
0.5%
Other values (3171365)4975699
78.2%
ValueCountFrequency (%)
-758857181
< 0.1%
-755491301
< 0.1%
-72830306.451
< 0.1%
-67020175.561
< 0.1%
-64707849.241
< 0.1%
ValueCountFrequency (%)
13191233.771
 
< 0.1%
10000000145
< 0.1%
99968871
 
< 0.1%
99777611
 
< 0.1%
9977760.61
 
< 0.1%

dest_type
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
C
4211125 
M
2151495 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6362620
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowM
3rd rowC
4th rowC
5th rowM
ValueCountFrequency (%)
C4211125
66.2%
M2151495
33.8%
2021-01-22T00:29:01.898961image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-01-22T00:29:01.973561image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
c4211125
66.2%
m2151495
33.8%

Most occurring characters

ValueCountFrequency (%)
C4211125
66.2%
M2151495
33.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter6362620
100.0%

Most frequent character per category

ValueCountFrequency (%)
C4211125
66.2%
M2151495
33.8%

Most occurring scripts

ValueCountFrequency (%)
Latin6362620
100.0%

Most frequent character per script

ValueCountFrequency (%)
C4211125
66.2%
M2151495
33.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII6362620
100.0%

Most frequent character per block

ValueCountFrequency (%)
C4211125
66.2%
M2151495
33.8%

days
Real number (ℝ≥0)

HIGH CORRELATION

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.49190679
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Memory size48.5 MiB
2021-01-22T00:29:02.060904image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q17
median10
Q314
95-th percentile21
Maximum31
Range30
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.921812248
Coefficient of variation (CV)0.5644171612
Kurtosis0.3323014534
Mean10.49190679
Median Absolute Deviation (MAD)4
Skewness0.3778477393
Sum66756016
Variance35.0678603
MonotocityIncreasing
2021-01-22T00:29:02.186277image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
1574255
 
9.0%
2455238
 
7.2%
8449637
 
7.1%
6441005
 
6.9%
13428583
 
6.7%
17425766
 
6.7%
7420583
 
6.6%
9417919
 
6.6%
11417859
 
6.6%
15401282
 
6.3%
Other values (21)1930493
30.3%
ValueCountFrequency (%)
1574255
9.0%
2455238
7.2%
31070
 
< 0.1%
428240
 
0.4%
59789
 
0.2%
ValueCountFrequency (%)
31272
 
< 0.1%
3011287
 
0.2%
2954890
0.9%
2814661
 
0.2%
278578
 
0.1%

Interactions

2021-01-21T23:58:54.749318image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-21T23:59:34.283917image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:00:25.448031image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:01:30.220839image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:02:47.419025image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:04:15.885747image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:04.232140image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:07.352038image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:10.646224image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:14.112039image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:17.383001image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:20.755278image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:23.681878image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:26.862180image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:30.179265image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:33.132251image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:36.109928image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:38.759905image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:41.330880image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:43.901697image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:46.646963image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:49.409996image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:52.169969image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:55.200809image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:06:57.850646image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:00.950629image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:05.121213image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:08.214715image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:11.424801image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:14.317337image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:17.442172image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:20.556078image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:23.577186image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:26.496706image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:29.128917image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:31.924350image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:34.849821image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:37.786932image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:40.856976image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:43.719001image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:46.517475image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:49.360407image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:52.390218image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:55.080153image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:07:58.026277image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:00.830074image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:04.007901image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:06.849228image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:09.903893image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:12.856004image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:15.803882image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:18.813066image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:21.454164image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:24.186351image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:27.014014image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:29.980922image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:33.012238image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:35.841388image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:38.688318image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:41.547324image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:44.323108image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:47.159453image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:50.158220image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:53.154086image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:56.177787image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:08:59.000133image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:09:01.890070image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:09:04.953014image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:09:07.798097image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:09:10.649099image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:09:13.507210image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-22T00:09:16.469456image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-01-22T00:29:02.431010image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-01-22T00:29:02.643172image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-01-22T00:29:02.828485image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-01-22T00:29:03.031363image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-01-22T00:29:03.207140image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-01-22T00:14:01.462193image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-01-22T00:14:16.546150image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

steptypeamountname_origoldbalance_orignewbalance_origname_destoldbalance_destnewbalance_destis_fraudis_flagged_frauderror_origerror_destdest_typedays
01PAYMENT9839.64C1231006815170136.00160296.36M19797871550.00.00001.455192e-119839.64M1
11PAYMENT1864.28C166654429521249.0019384.72M20442822250.00.0000-1.136868e-121864.28M1
21TRANSFER181.00C1305486145181.000.00C5532640650.00.00100.000000e+00181.00C1
31CASH_OUT181.00C840083671181.000.00C3899701021182.00.00100.000000e+0021363.00C1
41PAYMENT11668.14C204853772041554.0029885.86M12307017030.00.00000.000000e+0011668.14M1
51PAYMENT7817.71C9004563853860.0046042.29M5734872740.00.0000-9.094947e-137817.71M1
61PAYMENT7107.77C154988899183195.00176087.23M4080691190.00.0000-1.091394e-117107.77M1
71PAYMENT7861.64C1912850431176087.23168225.60M6333263330.00.0000-1.000000e-027861.64M1
81PAYMENT4024.36C12650129282671.000.00M11769321040.00.0000-1.353360e+034024.36M1
91DEBIT5337.77C71241012441720.0036382.23C19560086041898.040348.7900-3.637979e-126886.98C1

Last rows

steptypeamountname_origoldbalance_orignewbalance_origname_destoldbalance_destnewbalance_destis_fraudis_flagged_frauderror_origerror_destdest_typedays
6362610742TRANSFER63416.99C77807100863416.990.0C18125528600.000.00100.063416.99C31
6362611742CASH_OUT63416.99C99495068463416.990.0C1662241365276433.20339850.16100.00.03C31
6362612743TRANSFER1258818.90C15313014701258818.900.0C14709985630.000.00100.01258818.90C31
6362613743CASH_OUT1258818.90C14361187061258818.900.0C1240760502503464.501762283.40100.00.00C31
6362614743TRANSFER339682.12C2013999242339682.120.0C18504239040.000.00100.0339682.12C31
6362615743CASH_OUT339682.12C786484425339682.120.0C7769192900.00339682.12100.00.00C31
6362616743TRANSFER6311409.50C15290082456311409.500.0C18818418310.000.00100.06311409.50C31
6362617743CASH_OUT6311409.50C11629223336311409.500.0C136512589068488.846379898.00100.00.34C31
6362618743TRANSFER850002.50C1685995037850002.500.0C20803885130.000.00100.0850002.50C31
6362619743CASH_OUT850002.50C1280323807850002.500.0C8732211896510099.007360101.50100.00.00C31